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1. INTRODUCTION 

Agriculture is very bright sector in Indian economic growth. So, there is a need to do the research work 
in agriculture domain. As Indian population is increasing day by day so there is more requirement of crop yield. 
To increase the crop yield and profit for the farmers more accurate and profitable crop should be cultivated. This 
is achieved by considering financial dimension as a return on investment (ROI) which helps for the farmers to 
take more accurate and an intelligent decision for the crop selection based on the profit and loss as per the 
market price and demand [1]. Transition is very much required from traditional thinking to more advanced 
thinking. This can be achieved by providing accurate information on the tip of the finger to the farmer as a 
knowledge data discovery using modern technology like machine learning (ML), deep learning (DL), and 
internet of things (IoT) [2]. In this paper we are presenting the work on development of ROI framework by 
using more efficient machine leaning techniques which can improve the performance of the crop 
recommendation system. In this paper an emphasis given on the agricultural problems and prospectus of yeola 
taluka which is located in Nashik district of Maharashtra state. There is uneven distribution of rainfall in this 
study area. The socio-economic status of this area is primly bound to agriculture. In our research study we find 
the low productivity of land, scarcity of water, traditional methods of farming, uneven climatic changes, 
economically backwardness of farmers, fragmentation of farm and enormous low market prices for agricultural 
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products. These are the basic problems of this region which motivates us to do the research work in such type of 
area which can increase economic growth of this farmers by providing more efficient crop recommendation 
system [3]. ROI framework is designed by integrating crop yield prediction (CYP) system and crop price 
prediction (CPP) system. This framework is developed by using data set collected from yeola region with the 
help of different government agencies. In this research work we have applied data analysis on collected data set 
to get regression statistics. We have implemented multiple regression model considering soil fertility index 
(SFD as a most important feature and climatic factors as other attributes for CYP. And different levels of market 
price as min, max, avg for the CPP [4]. In this paper we have evaluated performance of different regression 
algorithms and results shown that improved sequential minimal optimization (SMO) and multilayer perceptron 
(MLP) regression model gives better performance as compared to other machine learning algorithms for 
regression. Later MLP model will get optimized to get better results by applying hypertunning process on the 
existing one [5]. Much research worked on traditional machine learning algorithms as support vector machine 
(SVM), naive Bayes (NB), random forest (RF), and decision tree (DT) for analyzing and predicting the crop 
based on soil and whether parameters, detailed description of this work is given in the reference [6], [7]. But the 
main disadvantage is that due to lack of optimization technique these algorithms are not giving better 
performance, and which is overcome in our current research work by applying hyper parameter tuning process 
for accurate model selection. 

Some authors worked on different neural network algorithms, hybrid approach of different machine 
learning algorithm, boosting, and bagging techniques, adaptive clustering methods, association mining 
techniques for crop recommendation system. But in this research work financial dimension is missing i.e., ROI 
which is very important component which helps to improve economic growth of the farmers if accurate 
information is provided to them as well as other agriculture experts by using intelligent approach of modelling 
in the crop recommendation framework [8]-[10]. Some authors worked on ontology based farming and analysis 
of agriculture data using data mining techniques [11], [12]. In our research work we have developed ROI 
framework by integrating crop yield and CPP system to recommend more suitable crop to the farmers. 

In the reference paper [13] research work has been done for crop recommendation system by using 
convolutional neural networks (CNN) which is the most widely used deep learning algorithm. But results show 
that no specific conclusion can be drawn as to what the best model is, but they clearly show that some machine 
learning models are used more than the others .We are integrating two different systems as CPP and CYP to get 
ROI framework and output is continuous real value. In the reference [14] researchers done an extensive 
experimental survey of regression methods by using all the regression datasets of the union cycliste 
internationale (UCI) machine learning repository. In this survey they have evaluated more than 77 regression 
models belonging to 19 different families like nearest neighbors, regression trees and rules, RF, bagging and 
boosting, neural networks, DL, and support vector regression. 

In our experimental research work we observed that sequential minimal optimization (SMO) regressor 
(SMO algorithm for SVM regression) and MLP regression working more efficiently as compared to other 
regression techniques like bagging regressor, Gaussian regressor, RF regressor, AdaBoost regressor [15]. In this 
paper, researchers address the SVM regression problem and proposed an iterative algorithm, called SMO, for 
solving the regression problem using SVM. This algorithm is an extension of the SMO algorithm proposed by 
platt for SVM classifier design. They have suggested two modifications of the SMO algorithm that overcome 
the problem by efficiently maintaining and updating two threshold parameters. Their computational experiments 
show that these modifications speed up the SMO algorithm significantly in most situations. 

In the reference [16] researchers worked on MLP-regressor for multiple linear regression analysis and 
artificial neural network (ANN) as tools for performance measurement has been employed in this work. In the 
reference [17] researchers concludes that with respect to the parametric model, the ANN has shown better 
results from the statistical analysis that it is a better modelling technique to support decision making for various 
type of recommendations. Nashik district is a major agriculturally dominant district in the Maharashtra. 
Therefore, it is important to highlight the less developed agricultural region and try to promote agricultural 
development. So, our present work is an attempt in the same direction but at taluka level we have selected as 
yeola regions. In topographical research study of Nashik district at tehsil level titled as “Spatial analysis of 
agricultural development in Nashik district: A Tahsil level study” [18]. This research study helps us to identify 
research challenges and understands topographical condition of yeola region so that we can move forward in 
proper direction. 


2. RESEARCH METHOD FOR RETURN ON INVESTMENT (ROD DIMENSION 

ROI framework has been designed by using CYP system and CPP system. Performance analysis of 
various machine learning algorithms are evaluated to identify more efficient ML algorithm. This framework 
will predict accurate and profitable crop based on the profit and loss calculated by considering all type of 
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expense cost from initial cropping to final harvesting. This framework recommends more profitable crop as 
final output by integrating CYP and CPP model as shown in the following Figure 1. 
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Figure |. Proposed framework for ROI system 


Proposed framework describes that crop recommendation system has been developed for yeola 
region in which total 121 villages are there which are merged into total 6 circles. This recommendation 
incudes ROI dimension for crops cotton and corn. ROI value will be calculated by using CYP system and 
CPP system. Each of this system undergoes data collection from different government agencies and market 
committee, data analysis for regression statistics, and model deployment for multiple regression algorithms, 
performance evaluation analysis and final recommendation based on the provided input. Then crop price and 
crop yield value is used for ROI system. In this system balance sheet has been generated by considering all 
type of expense cost from initial cropping to final harvesting. Then profit or loss will be calculated to 
recommend more profitable crop as price [19]. 


3. RESULTS AND DISCUSSION 

In this section, the validation of the proposed ROI framework against existing regression techniques 
are illustrated with several parameter metrics in our experimental work. In the first subsection evaluation of 
multiple regression statistics has been done. Further data analysis for significance testing of predictors are 
evaluated. 


3.1. Performance evaluation for identifying optimized machine learning algorithm for crop yield 
prediction (CYP) system 
To design CYP system last three years (2018, 2019, 2020) circle wise data has been collected for 
yeola region of Nashik district. In this CYP system data has been collected from various digital sources and 
government agencies for 121 villages from yeola region. All parameters required in data analysis for 
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regression statistics to check significance level testing explained in detail in the reference [20]. Data analysis 
has been done by using multiple regression statistics in which standard error are calculated as shown in the 
following Table 1. Then analysis of variance (ANOVA) test has been applied for checking significance level 
[21] of input parameters such as crop year, rainfall, cultivation area, SFI as shown in Table 2. 


Table 1. Evaluation of regression statistics 


Regression parameters Regression statistics 
Multiple R 0.9999 1337 
R Square 0.999826747 
Adjusted R Square 0.9998253 
Standard Error 12.64813767 
Observations 484 


Table 2. ANOVA test for input features 


Parameters df SS MS F Significance F 
Regression 4 4.42E+08 110553285.4 691064.3 1.16603E-19 
Residual 479 76628.21 159.9753866 
Total 483 4.42E+08 


Results of ANOVA test following null hypothesis (Hg) and alternative hypothesis (H,) has been defined in 
the following testing. Hypothesis Testing: a=0.05 and P-value should be less than the threshold then only 
predictors use to predict output are significant as shown in Table 3 otherwise it rejects the null hypothesis. 
HO: Crop_Year (81) =Rainfall (B2) =SFI (83) =Cultivation_Area (64) == 

HI: At least, B1 OR B2 OR B4 OR B340 then ACCEPT H1 and REJECT HO 


Table 3. Data analysis for significance testing of predictors 


Parameters Coefficients Standard error t Stat P-value 
Intercept 2901.332 6.5836 0.4406 0.0065963 
Crop _ Year -1.438531 3.2633 -0.4408 0.65955 
Rainfall -0.000696 0.0078 -0.0888 0.0429219 
SFI 0.768672 1.4166 0.5425 0.0015876 
Cultivation _Area 3.924554 0.0023 1662.2 0.0362905 


Results for significance testing has been observed that overall multiple regression model was significant for 
(4,479)=F (1.16603E-19), P<0.05, R?=0.999826747, where o=0.05 then REJECT Null hypothesis HO and 
ACCEPT alternative hypothesis H1 for multiple regression equation; (2018, 301, 2.19, 567)=2225.072 Y 
(Crop Yield)=2901.33+(-1.438531382*Crop Year)+(-0.000696782*Rainfall)+(0.000696782*SFI)+ 
(3.924554098*Cultivation Area) 


3.2. Model deployment and performance evaluation of CYP system 

Total six machine learning algorithm as sequential minimal optimization regressor (SMO-REG), 
improved-SMO-regressor (ISMO-REG), multilayer perceptron neural network regressor (MLP-REG), 
bagging regressor (BAGG-REG), Gaussian regressor (G-REG), random forest regressor (RF-REG), and 
AdaBoost regressor (AB-REG) has been used for multiple regression to predict accurate yield of the crops 
[22]. These models are evaluated by using various performance metrics as shown in following Table 4. 

And from the results it has been concluded that SMO-REG, ISMO-REG, MLP-REG are giving 
better performance as compared to other algorithms .But to optimize the results and minimize the error hyper 
tuning process has been applied on MLP regressor by using stochastic gradient method, learning rate and 
momentum parameters [23]. Hyper tuning process achieved global minimum error as shown in the following 
Table 5. From optimized results it has been observed that at the value of learning rate I]=0.5 and momentum 
M=0.2 root mean squared error (RMSE) error has been minimized to 12.32 from 26.79 and which is the great 
achievement for us as we have reached to global minima. Graphical representation of hyper parameter tuning 
results has been presented in the following Figures 2 to 4 for data analysis of regression statistics of crop data 
set [24]. 
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Table 4. Performance analysis of ML algorithm for CYP 
ALGO COREL-COEF MAE RMSE RAE RRSE 
MLP-REG 0.9996 18.76 26.79 2.93% 2.80% 
ISMO-REG 0.9999 5.63 14.025 0.0812% 1.46% 
SMO-REG 0.9999 16.85 24.262 2.6386% 2.538% 
BAGG-REG 0.9956 19.85 94.699 3.107% 9.9% 
G-REG 0.9803 178.9 237.65 28.009% 2.42% 
RF-REG 0.9976 17.99 74.045 2.8158% 7.74% 
AB-REG 0.7386 435.3 644.41 68.135% 67.41% 


Table 5. Optimized results after hyper parameter tuning 
TUNNING OF HYPERPARAMETERS(I],M)> RMSE 


(0.1,0.1)=26.79 
(0.2,0.1)=25.57 
(0.3,0.1)=26.66 
(0.4,0.1)=29.98 
(0.5,0.1)=12.52 


(0.1,0.2)=29.21 
(0.2,0.2)=25.10 
(0.3,0.2)=26.79 
(0.4,0.2)=23.47 
(0.5,0.2)=12.32 


(0.1,0.3)=28.32 
(0.2,0.3)=24.73 
(0.3,0.3)=26.01 
(0.4,0.3)=12.52 
(0.5,0.3)=12.83 


(0.1,0.1)=27.49 
(0.2,0.4)=24.45 
(0.3,0.4)=28.52 
(0.4,0.4)=12.90 
(0.5,0.4)=13.44 
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Figure 2. Data analysis for LR (Learning Rate) vs. Figure 3. RMSE curve for momentum=1.1 and 1.2 
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Figure 4. RMSE curve for momentum=1.3 and 1.4 


3.3. Results and discussion on performance evaluation for crop price prediction system (CPP) 

Process defined for CPP is same as we have seen in previous section. Here only results have been 
presented in the following tables. Regression statistic analysis demonstrated in the Table 6. Analysis of 
variance (ANOVA) test has been applied and results has been displayed in the Table 7. Significance and 
hypothesis testing has been done and results are evaluated in the Table 8. Model evaluation and selection has 
been done in Table 9. 


Table 6. Evaluation of regression statistics for CPP 
Regression parameters Regression statistics 
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Multiple R 0.999757 

R Square 0.999513 

Adjusted R Square 0.999511 

Standard Error 32.95615 
Observations 730 
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Table 7. ANOVA test for input features for CPP 


Parameters df SS MS F Significance F 
Regression 4 1.62E+09 4.04E+08 372145.29 1.93E-26 
Residual 725 787428.4 1086.108 
Total 729 1.62E+09 


Table 8. Data analysis for significance testing of predictors 


Parameters Coefficients Standard error t Stat P-value 
Intercept 194.003 27.76939 -6.9862 6.40344E-12 
Min_Price 0.188153 0.045594 4.126717 4.10703E-05 
Max_price -0.04276 0.038175 -1.12017 0.263013278 
Commodity_Traded_Min 0.48156 0.051445 9.360633 9.73816E-20 
Commodity_Traded_Max 0.344199 0.038713 8.890943 4.76037E-18 


a=0.05 AND P-value should be less than the threshold then only predictors use to predict output are 


significant otherwise it rejects the null hypothesis. 
Min _Price(B1)=Max_price(B2)=Commodity_ Traded_Min(B3)=Commodity Traded_Max H1:Atleast, B1 OR 
B2 OR B340 then ACCEPT H1 and REJECT HO. 


Table 9. Performance analysis of ML Algorithm for CPP 


ALGO COREL-COEF MAE RMSE RAE RRSE Time (sec) 
MLP-REG 1 1.7433 2.0 0.1 % 0.1 % 256.32 
ISMO-REG 1 4.2731 76 0.2 % 0.51% 21.53 
SMO-REG 1 6.5401 9.7 0.4 % 0.65 % 72.67 
BAGG-REG 0.9999 13.112 21.63 0.8 % 1.45 % 0.08 

G-REG 0.9975 339.59 350.58 22.8% 23.7 % 5.04 

ADDITIVE-REG 0.9999 14.103 16.97 0.9 % 1.14% 0.07 


Graphical representation of Table 9 results has been presented in the following Figures 5 to 9 iLe., 
performance evaluation analysis for multiple regression model for CPP [25] evaluated by using performance 
parameters. 


COREL-COEF MAE 


COREL-COEF 
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Figure 5. Performance evaluation for COREL- Figure 6. Performance evaluation for mean absolute 
COEF error (MAE) 


3.4. ROI value estimator based on crop yield prediction (CYP) and crop price prediction (CPP) 
analysis 
ROI Estimator module is used to calculate profit and loss by using various expenses cost used for 
the cultivation of crop by farmers. In this module crop yield and crop price has been taken from CPP and 
CYP module explained in previous section. For the reference we have considered two crops as corn and 
cotton for yeola region for our experimental work. 


Int J Artif Intell, Vol. 11, No. 3, September 2022: 969-976 


Int J Artif Intell ISSN: 2252-8938 o 975 


RAE RRSE 
30.00% 30.00% 5 
Ww 20.00% 22% ~% 20.00% 23AR% 
ce 10.00% c 10.00% 
0.00% 0.1 : : 86% 0.95% 0.00% 0.1 : : 5% 1.M% 
\ \ 
Ye Ss ¥ ss S \ a ge SS Ss 
REGRESSOR REGRESSOR 
Figure 7. Performance evaluation for root absolute Figure 8. Performance evaluation for root relative 
error (RAE) squared error (RRSE) 
TIME 


Regressor 


Figure 9. Performance evaluation for time 


4. CONCLUSION 

ROI framework for profitable crop recommendation system has been developed by using optimized 
MLP regressor algorithm. By applying stochastic gradient decent (SGD) method and hyper tuning parameters 
i.e., learning rate (I]) and momentum (M) process, RMSE is minimized to 12.32 from 26.79. And data 
analysis has been applied to get accurate regression statistics which helps us to select appropriate model for 
crop recommendation system. Knowledge-based agriculture system is continuously benefiting our earth and 
helping people in various aspects of life in terms of crop management and yield improvement. 
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